A/B Testing

Experiment Objective:

The Hypothesis is by getting to know how many hours a student can be commited to the course can reduce the number of frustrated enrolled students , who eventually will the leave the course due to lack of commitment of time. Without much decrease in number of students to enroll , we can get more focused group of students for better counsiling from udacity coaches and improve student experience to complete the course successfully

Experiment Design

Metric Choice

List which metrics you will use as invariant metrics and evaluation metrics here. (These should be the same metrics you chose in the "Choosing Invariant Metrics" and "Choosing Evaluation Metrics" quizzes.) For each metric, explain both why you did or did not use it as an invariant metric and why you did or did not use it as an evaluation metric. Also, state what results you will look for in your evaluation metrics in order to launch the experiment.

Invariant Metrics

Metric Explanation dmin
Number Of Cookies It is a invariant metric since it is assigned before even the experiment is started, not effect by experiment. And it is distributed random and approximately equal among control and experiment groups 3000
Number Of Clicks It is a invariant metric since click count before even the experiment is started,not effect by experiment 50
Click Through Probability(CTP) As number of clicks and number of cookies are invariant,where $$Click Through Probability= \frac{Number Of Clicks}{Number Of Cookies}$$ , Click Through Probability should be invariant 240

Evaluation Metrics

Metric Explanation dmin
Gross Conversion(GC) $$GC=\frac{no.of user IDs To Enroll In The Free Trial}{no.of Cookies To Click Start Free Trial}$$ Due to the experiment the no.of userID might be reduced keeping no.of click invariant resulting in drop of GC , Hence from GC we get the relative measure of decrease in frustrated students who eventually will leave the course. 0.01
Net Conversion(NC) $$NC=\frac{Number Of User IDs To Make Payment}{Number Of Cookies To Click Start Free Trial}$$, much decrease in numerator might not be seen and no.of clicks is invariant, so a small decrease in NC will be observed.Hence from NC we get the relative measure of students who eventually will complete the course successfully. 0.0075

Other Relevant Metrics

Metric Explanation dmin
Number Of User IDs Not a invariant metric since control group will have more no.of userIDs than experiment group. It can be a evalution metric but other metrics give a relative measure which are better options 50
Retention (R) $$R=\frac{Number Of User IDs To Make Payment}{Number Of User IDs To EnrollInThe Free Trial}=\frac{Net Conversion(NC)}{Gross Conversion(GC)}$$ It is a Redundant metric, which can be a evalution metric but choosing GC and NC eliminates the need for Retention metric.It can measure whether or not the screener had an effect on the 14-day dropout rate. And the unit of diversion is not same as unit of analysis leading to higher variability. 0.01

Results that conform the launch of experiment:

Metric Explanation
Gross Conversion If we observe a practically significant decrease in GC for control compared to experiment group they are conscious about their commitment of time
Net Conversion We can expect NC to not go below practical Sigificance boundary maintaining the Net Conversion conforms launch of experiment

Measuring Standard Deviation

List the standard deviation of each of your evaluation metrics. (These should be the answers from the "Calculating standard deviation" quiz.) For each of your evaluation metrics, indicate whether you think the analytic estimate would be comparable to the the empirical variability, or whether you expect them to be different (in which case it might be worth doing an empirical estimate if there is time). Briefly give your reasoning in each case.

Title Probability N $$ Std Of Error_{40k} = \sqrt{\frac{P(1-P)}{N}} $$ $$ Std Of Error_{5k} =Std Of Error_{40k}\times \sqrt{\frac{40000}{5000}}$$
Gross Conversion 0.20625 3200 0.007152599 0.0202
Retention 0.53 660 0.01942741 0.0549
Net Conversion 0.1093125 3200 0.005515979 0.0156

Unit of diversion is No.of cookies and both the denominators of GC and NC is No.of cookies, therefore the unit of analysis is equal to unit of diversion ,so its alright to assume that

$$ {\sigma}^2_{analytical} \approx {\sigma}^2 _{emperical}$$

hence we need not do the empirical analysis.

Sizing

Number of Samples vs. Power

Indicate whether you will use the Bonferroni correction during your analysis phase, and give the number of pageviews you will need to power you experiment appropriately. (These should be the answers from the "Calculating Number of Pageviews" quiz.)

Bonferroni correction:

The Bonferroni correction should not be used as we are using both the test results of evaluation metrics to make the decision. The main problem with Bonferroni correction is that often you will be tracking metrics that are correlated and all tend to move at the same time,in that case this method is too conservative.

Statistical power (1-$\beta$)= 80% & Significance level ($\alpha$)= 5%

Evaluation Metric Baseline conversion rate Minimum detectable effect number of clicks needed per sample for “Start free trial” Pageviews per sample Pageviews for total experiment
Gross conversion 20.625% 1% 25,835 322,937.5 ($25,835 \times \frac{40000}{3200}$) 645,875(322,937.5 $\times$ 2)
Net conversion 10.93125% 0.75% 27,413 342,662.5 ($27,413 \times \frac{40000}{3200}$) 685,325(342,662.5 $\times$ 2)

Larger sample size is 685325 pageviews, enough to power the experiment for both metrics.

Duration vs. Exposure

Indicate what fraction of traffic you would divert to this experiment and, given this, how many days you would need to run the experiment. (These should be the answers from the "Choosing Duration and Exposure" quiz.)

Give your reasoning for the fraction you chose to divert. How risky do you think this experiment would be for Udacity?

To decide the duration one must set the exposure depending on the risk factor of the experiment.If the experiment has a risk of effecting the student we should set minimal traffic for a focused group to observe how the participants are responding accordingly you can increase the traffic.Just by collecting the information of commitment of time does not cause a major harm so , audacity can go for 100% traffic to get the results as fast as possible.

$$Pageviews / day = 40000$$$$Total-days= \frac{685325}{40000} = 17.13313 \cong 18 days$$

Experiment Analysis

Sanity Checks

For each of your invariant metrics, give the 95% confidence interval for the value you expect to observe, the actual observed value, and whether the metric passes your sanity check. (These should be the answers from the "Sanity Checks" quiz.)

For any sanity check that did not pass, explain your best guess as to what went wrong based on the day-by-day data. ​Do not proceed to the rest of the analysis unless all sanity checks pass.

metric total events control total events exp prob given expected prob prob observed sd m=sd*z(95%) CI upper CI lower Passed
No.of cookies 345543 344660 N/a 0.5 0.5006 0.0006 0.001179608 0.5012 0.4988 Yes
No.of clicks 28378 28325 N/a 0.5 0.5005 0.00209 0.004116 0.5041 0.4958 Yes
CTP 345543 344660 $P_{pool}=0.08215

$|$P{diff}=0$|.000056620|$SD{pool}=0.00066$|0.001295679|0.00129|-0.00129|Yes|

The sanity check for CTP is done by finding $$ P_{pool}= \frac{28378+28325}{3455423+344660}= 0.08215$$ $$SD_{pool}=\sqrt{0.08215 \times (1-0.08215)\times \frac{345543+344660}{345543 \times 344660}}=0.00066$$ And checking if probabilty difference .000056620 is in the bounds ,so that the two probabilities come from the same population.

Result Analysis

Effect Size Tests

For each of your evaluation metrics, give a 95% confidence interval around the difference between the experiment and control groups. Indicate whether each metric is statistically and practically significant. (These should be the answers from the "Effect Size Tests" quiz.)

metric Expected Prob cont Prob exp Prob diff $P_{pool}$ $SE_{pool}$ $CI_{lower}$ $CI_{upper}$ $P_{bound}$ $Prac_{sig}$ $Stat_{sig}$
GC 0 0.2188 0.1983 -0.0205 0.20861 0.00437 -0.02912 -0.0119 0.01 True True
NC 0 0.1176 0.1127 -0.0049 0.115 0.003 -0.0116 0.001857 0.0075 False False

If CI contains zero in it then it is not statistically significant and if $P_{bound}$(dmin) is in CI then it is not practically significant.It is practically significant if the confidence interval does not include the practical significance boundary (that is, you can be confident there is a change that matters to the business.) since CI For Net Conversion (-0.0116,0.001857) contains -0.0075 ,fails the practical significance test.

Sign Tests

For each of your evaluation metrics, do a sign test using the day-by-day data, and report the p-value of the sign test and whether the result is statistically significant. (These should be the answers from the "Sign Tests" quiz.)

Metric Sign Test-value events with positive change total events $$Stat_{sig}$$(α = 0.05)
Gross conversion 0.0026 19 23 Yes
Net conversion 0.6776 13 23 No

by using the online calculator [https://graphpad.com/quickcalcs/binomial1.cfm]. I calculated the probabilities.

Summary

State whether you used the Bonferroni correction, and explain why or why not. If there are any discrepancies between the effect size hypothesis tests and the sign tests, describe the discrepancy and why you think it arose.

Bonferroni correction is not used since we are using both the test results for deciding to launch the experiment.If we were depending solely on a single metric for making decision ,then we would use Bonferroni correction. There are no discrepancies between the effect size hypothesis tests and the sign tests.Gross Conversion is statistically significant for both the control and experiment group in effect size hypothesis tests and the sign tests, where as Net Conversion is not statistically significant for both groups in either of the tests.

Recommendation

Make a recommendation and briefly describe your reasoning.

From the experiment Gross conversion shows significant decrease,which would eliminate the students with less time and udacity coaches can have more capacity to improve the more focoused group of students. Even though Net Conversion Metric does not pass the significance tests, but the 95 percent confidence interval contains the -0.0075 practical significance boundary on the negative side.I would not recommend the launch of experiment since there might be a revenue loss if the net conversion goes below the practical significance boundary, which seems risky for a 100% traffic.

Follow-Up Experiment

Give a high-level description of the follow up experiment you would run, what your hypothesis would be, what metrics you would want to measure, what your unit of diversion would be, and your reasoning for these choices.

In order to reduce the number of frustrated students we can perform the following experiment:

We need to design a Quick test which is based on the prerequisites for the Nanodegree that is Just a half hour test. This Test will result in 2 things:

It makes the student aware about the prerequisite knowledge that they need to have in order to successfully complete the Nanodegree.

It mainly gives udacity the idea of what is the caliber and capacity of the student,since not every one are same, and have variable capabilities,so with the combination of the number of hours they can dedidate to the course and their measure of capability we can decide weather they can start the course or access the free materials.

And we can give a rough estimate how many hours are ideal particular student need to spend on the course according to the scores of the test

The experiment group on clicking the start free trial is asked to take the test before moving further in the course and suggest that successfully completing the test and their hours of dedication is above the minimum threshold required to complete the nanodegree makes them the ideal candidate for the Nanodegree.

Hypothesis : The number of students successfully completing the test and their hours of dedication is above the minimum threshold are more likely to remain enrolled in the Nanodegree and make their first payment.

Unit of Diversion:initially Cookies, user after enrollment in the nanodegree program can be tracked by userids.

Invariant Metrics:

Number of Cookies: It is a invariant metric since it is assigned before even the experiment is started, not effect by experiment. And it is distributed random and approximately equal among control and experiment groups

Number of students clicking the Start Free Trial: It is a invariant metric since click count before even the experiment is started,not effect by experiment.The button appears before Udacity makes a suggestion to take the test, there is a random and approximately equal distribution among control and experiment groups by clicking or not clicking, thus it will be a good Invariant.

Evaluation Metrics:

Net Conversion:$$NC=\frac{Number Of User IDs To Make Payment}{Number Of Cookies To Click Start Free Trial}$$,We get the relative measure of students who eventually will complete the course successfully. Experiment group's users who will be taking the test are having a idea on prerequisite knowledge and the amount of time suggested by udacity gives them foresight weather to make a decision about joining the nanodegree program.

Course Completion Ratio: The number of students making their first payment to the number of students completing the Nanodegree program.

Most of Nanodegree students are not very sure about the prerequisite knowledge or the amount of hours they can dedicate to the Nanodegree,although they made their first payment, but eventually they find it very hard to continue with the nanodegree and quit in between the course.As the users in the experiment group have taken the test are having a idea on prerequisite knowledge and the amount of time suggested by udacity by the test scores gives them foresight to successfully complete the nanodegre. Hence, it can be used as an evaluation metrics.


In [ ]: